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Abstract. - Network-based recommendation algorithms for user-object link predictions have 
achieved significant developments in recent years. For bipartite graphs, the reallocation of re- 
source in such algorithms is analogous to heat spreading (HeatS) or probability spreading (ProbS) 
processes. The best algorithm to date is a hybrid of the HeatS and ProbS techniques with homoge- 
nous initial resource configurations, which fulfills simultaneously high accuracy and large diversity. 
We investigate the effect of heterogeneity in initial configurations on the HeatS+ProbS hybrid al- 
gorithm and find that both recommendation accuracy and diversity can be further improved in 
this new setting. Numerical experiments show that the improvement is robust. 
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Introduction. — In recent years, the huge data sets 
available in natural, social and information sciences have 
witnessed the flourish of complex network analysis [lH2] ■ 
In most cases, the data are recorded as snapshots. The 
underlying mechanisms of network evolution are usually 
unknown, which is true in most situations even when the 
. growth dynamics of networks are recorded. Therefore, one 
' has to predict missing links in incompletely recorded net- 
. works or future links, which has important scientific and 
' practical significance [H-IB]. To date, various methods have 
been proposed and developed for link prediction in differ- 
ent fields [7HI3]. 

As a special case of complex networks, bipartite graphs 
are quite common especially in social sciences. In everyday 
life, people buy books, articles for daily uses, and foods 
from online or convenience stores, collect online movies 
and music, choose restaurants and resorts, invest stocks 
and derivatives, and so on [Tl]. In medical science, sci- 
entists try to unveil the unknown interaction mechanisms 
between huge numbers of drugs and targets [15] , and pre- 
dicting possible drug-target links is of crucial importance 
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in drug design. It is often necessary to make choices with- 
out sufficient personal experience of the alternatives. Rec- 
ommender systems are mainly aimed at providing link pre- 
dictions for such systems. 

There are many recommender systems designed for dif- 
ferent systems. One of the most successful methods for 
recommender systems is based on the collaborative fil- 
tering technique |16| . which has a large number of vari- 
ants [T7] and their hybrids [IH] . Recently, a lot of efforts 
in the physics community have been devoted to design 
recommendation algorithms on bipartite graphs [T5H25] . 
where the hybrid algorithm combining the heat spreading 
(HeatS) and probability spreading (ProbS) algorithms is 
found to achieve simultaneously higher recommendation 
accuracy and greater diversity [53] ■ In this work, we pro- 
pose an improved HeatS+ProbS algorithm by considering 
the heterogeneity in initial source configurations. 

Algorithms. — Generally, recommender systems are 
designed based on bipartite user-object graphs 5(u, o, E), 
which contain users u = {ui,U2, • ■ ■ ,Um}, objects o = 
{oi, 02, ■ • • , On}, and links E = {da : Ui £ u, Oq 6 o}. A 
link is drawn between Ui and Oq if user Ui has collected 
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object Oq. For readability, we use i,j for the subscripts of 
users and a, /? for objects. The user-object bipartite graph 
can be presented hy an m x n adjacent matrix A, where 
Uia = 1 if user Ui has collected object Oq, and aia = 
otherwise. 

The resource reallocation process for each user in the 
network-based recommendation algorithms can be ex- 
pressed using a single equation 



Win 



(1) 



where to — [fl q, ■ ■ ■ ^ fj\ g] is the initial configuration of re- 
source on objects, W is the resource reallocation matrix, 
and f = [/i, ■ • • , fn] is the final configuration of resource 
on objects. The objects are sorted in a descending order 
and a certain number of objects with the highest final re- 
sources that have not been collected by user Ui are recom- 
mended to him. After one knows the resource reallocation 
matrix W and the initial configuration fg on objects, the 
recommendation algorithm is determined. 

In the heat spreading algorithm [23], a less popular ob- 
ject with low degree will obtain larger final resource and 
the recommendation list is diverse, where the resource re- 
allocation matrix is 
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where ka is the degree of Oa and ki is the degree of Ui. 
In contrast, a popular object with high degree will have 
more final resource in the ProbS algorithm and the recom- 
mendation list is accurate, where the resource reallocation 
matrix is [12] 
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In order to solve the apparent accuracy-diversity dilemma 
of recommender systems, a hybrid algorithm has been pro- 
posed [23] , which combines these two algorithms as follows 
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The elegant hybrid algorithm results in higher accuracy 
and greater diversity when the parameter A is tuned to 
around an optimal value. 

The initial resource vector fo in many network based 
recommendation algorithms, including the HeatS-|-ProbS 
hybrid algorithm, is determined as follows [T9H2T1I23] 
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That is to say, if object Oa has been collected by Ui, then 
its initial resource is one, otherwise it is zero. It has 
been shown that a heterogeneous initial configuration of 
resource 

/: = a^ckl (6) 



can improve the recommendation accuracy of the ProbS 
algorithm [21]. The aim of this Letter is to investigate 
the effect of the initial resource configuration on the rec- 
ommendation performance (accuracy and diversity) of the 
HeatS-f ProbS hybrid algorithm. 

Data. — Two benchmark datasets have been adopted 
to test the performance of the recommendation algorithm. 
The first dataset, MovieLens, is downloaded from the web- 
site of GroupLens Research [35]. MovieLens' users rank 
movies at five discrete levels from 1 to 5. It contains 
n = 1682 movies (objects), m = 943 users, and 100,000 
ratings. If the rating of movie Oa made by user Ui is no 
less than 3, we argue that Ui collected Oq. This results 
in 82520 user-object pairs and the sparsity of the bipar- 
tite network is 0.0582. The second dataset, Netfiix, is a 
randomly selected subset of the huge dataset provided for 
the Netfiix Prize [IS]- It consists of n = 6000 objects, 
m = 10000 users, and 701749 links after a coarse-graining 
map from the five-level rating to the unary form. The 
sparsity of the bipartite network is 0.0117. 

In order to investigate the performance of the proposed 
recommendation algorithm, the links in each data set are 
randomly divided into two subsets. The training set con- 
tains 90% links while the probe set Ep contains the re- 
maining 10% links. The algorithm is implemented using 
the training set to make recommendations, which are com- 
pared with the links in the probe set for performance (ac- 
curacy and diversity) [53]. 

Accuracy of recommendation. — We utilize three 
measures for the quantification of the recommendation ac- 
curacy. The first measure is the ranking score, which is 
defined as follows dni 
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where \Ep\ is the number of links in the probe set, and qia 
is the position that Oa placed in user Mj's recommendation 
list. If the objects ranking from qi to (72 in the list have 
the same score as Oq, qia = {qi + q-i)!^ [13]- The smaller 
is r, the more accurate is the algorithm. 

Plots (a) and (d) of fig. [T] present the contours of the 
r(A, T]) functions for the MovieLens data and the Netfiix 
data, showing the dependence of the ranking score r as a 
function of the two parameters A and r\. Note that the 
results for the HeatS-|-ProbS hybrid algorithm are given 
by 77 = 0. For simplicity, we call the contour line whose 
ranking score is the minimum obtained by the optimal A 
in the HeatS-l-ProbS hybrid algorithm as the UP line for 
ranking score, 



--(A, 77) = rnp^n 



in = minr(A, 77 = 0). 
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In other words, our algorithm with the parameters A and 77 
lying on the HP line has the same performance as the orig- 
inal HeatS-l-ProbS hybrid algorithm. When the parame- 
ter point (A, 77) falls with the HP line such that r{X,r]) < 
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Fig. 1: (Color online) Contour plots of the three recommendation accuracy measures with respect to the two parameters A and 
rj. The first row is for the MovieLens data and the second row for the Netflix data. The three columns correspond to the ranking 
score r{X,r]), the precision P{\,rj), and the recall R{\,'q). For the precision and recall, the length of the recommendation list 
is L — 50. The contour lines tangent to the line 77 = are the HP lines for the three accuracy measures. 



''HP, mill, our algorithm outperforms the HeatS+ProbS hy- 
brid algorithm. On the contrary, the HeatS+ProbS hybrid 
algorithm performs better when the parameter point lo- 
cates outside the HP line, that is, r(A, 77) > rHp,inin- 

For the MovieLens data, the ranking score reaches its 
minimum r^^ ~ 0.079 when A = Aopt = 0.26 and 77 = 
JJopt = —0.71. Compared with the miniinal ranking score 
''HP, mill = 0.840 at the optimal A = Anp.opt = 0.16 for 
the HeatS-|-ProbS hybrid algorithm, we gain an improve- 
ment of recommendation accuracy by 1 — T-min/rHp^min — 
6.0%. For the Netflix data, we have Tmin = 0.039 when 
Aopt = 0.21 and 77opt = —0.51 and rHp,min = 0.045 when 
Anp.opt = 0.23. We gain an improvement of recommenda- 
tion accuracy by 12.8%. It is found that Aopt is close but 
not necessarily equal to Anp.opt- 

The second measure is the recommendation precision, 
which is defined as follows [23] 

p^m^ (9) 

m L 
where di^ is the number of user Ti^'s deleted links con- 
tained in the top L objects of his reconuirendation list. 
Plots (b) and (e) of fig. [1] illustrate the contour lines of 
the precision functions P{X,ri) with L = 50 for the two 
data sets. Analogous to the HP line for ranking score, we 
can define the HP line for precision. 
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From fig. [U we observe significant improvements achieved 
by our algorithm. 



For the MovieLens data, we find that Pmax = 0.0904 
when A = Aopt = 0.31 and 77 = 77opt = —0.69 in 
our algorithm, while Pnp.max = 0.0865 at the optimal 
A = Anp.opt = 0.30 for the original HeatS-|-ProbS hy- 
brid algorithm. We gain an improvement of recommenda- 
tion accuracy by Pmax/Pnp.max — 1 = 4.3%. For the Net- 
flix data, we have Pmax = 0.0593 when Aopt = 0.21 and 
'7opt == -0.39 and Pnp.max = 0.0564 when Anp.opt = 0.20. 
We gain an improvement of recommendation accuracy by 
5.1%. 

The third measure is the recall, which is defined as fol- 
lows [23l 



1 ™ J 



(11) 



where diL is the number of user u^'s deleted links con- 
tained in the top L objects, and h is the number of user 
Mi's deleted links. Plots (c) and (f) of fig. [T] illustrate 
the contour lines of the precision functions R{X, 77) with 
L = 50 for the two data sets. Similarly, we can define the 
HP line for recall, 



P(A,77) = PhP,: 



= maxP(A, 77 



0). 
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From fig. [U we also observe significant improvements 
achieved by our algorithm. 

For the MovieLens data, we have Pmax = 0.559 when 
A = Aopt = 0.31 and 77 = 77opt = —0.51 in our algo- 
rithm, while Pnp.max = 0.548 when A = Anp.opt = 0.29 
for the original HcatS-|-ProbS hybrid algorithm. We 
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gain an improvement of recommendation accuracy by 
^max/-RHP,max ^ 1 = 2.0%. For the Nctflix data, we have 



R„ 



— 0.439 when Aopt — 0.21 and 770, 



opt 



-0.29 and 



^HP.max = 0.430 when Anp.opt = 0.21. We gain an im- 
provement of recommendation accuracy by 2.1%. 

Diversity of recommendation. We adopt two 

measures to characterize the diversity of recommenda- 
tions, the intra-user diversity -Dintra and the inter-user di- 
versity -Dinter- 

The intra-user diversity characterizes the average dis- 
similarity among the top L objects in a single user's list, 
denoted Ci. The similarity between two objects Oa and op 
can be measured by the S0rensen index [27] 
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and the intra-user diversity of user UiS recommendation 
list of length L can be defined as pTl[28] 
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where Oq £ Ci and op G Ci, and the average intra-user 
diversity is 
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degrees and the algorithm favors the ProbS part. Com- 
bined with the results in fig.[U our algorithm with negative 
77 values can improve both the accuracy and diversity of 
the recommendation. This finding is consistent with the 
results in ref. [211, in which A = 1. 
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Fig. 2: (Color online) Intra-user diversity (a, c) and inter-user 
diversity (b, d) as a function of A and 77 for MovieLens (a, 
b) and Netflix (c, d). The length of the recommendation list 
is L = 50. The results for the original HeatS+ProbS hybrid 
algorithm are given with 77 = 0. 



Note that Ci and Cj of any two users are usually different 
and thus their Df^tra ^^^ -^fntra values differ from one 
user to another. A greater or lesser value of the intra-user 
diversity means higher or lower novelty of a single user's 
recommendation list. 

The inter-user diversity indicates the uniqueness of dif- 
ferent users' recommendation lists, which can be calcu- 
lated as follows [2n[23l 
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where \Ci n Cj\ is the number of the common objects of 
the top L of the two lists Ci and Cj . The inter-user di- 
versity reflects the uniqueness of different users' recom- 
mendation lists and is a measure of personalization of the 
recommendation algorithm. A greater or lesser value of 
the inter-user diversity means higher or lower personaliza- 
tion of users' recommendation lists. 

Figure [3] shows the dependence of the intra-user diver- 
sity and the inter-user diversity as a function of A and 77 
for the MovieLens and Netflix datasets. The length of the 
recommendation list is L = 50. The four plots share sev- 
eral similar features. For a fixed value of r/, the diversity 
decreases with increasing A. This is expected since the 
Heats part of the hybrid algorithm dominates when A is 
small. For a fixed value of A, the diversity decreases with 
increasing rj, which can be understood that, for larger val- 
ues of 7], more resource is put on the objects with larger 



Performance comparison of recommendation al- 
gorithms. — We compare the performance of four rec- 
ommendation algorithms, HeatS, ProbS, HeatS-|-ProbS, 
and HcatS-|-ProbS with heterogeneous initial configura- 
tion (HPIC). The comparison is based on three accu- 
racy measures (ranking score r, precision P, and recall 
R) and two diversity measures (intra-user diversity -Dintra 
and inter- user diversity -Dintor) using the MovieLens and 
Netflix data sets, respectively. The parameter A of the 
HeatS-|-ProbS hybrid algorithm is tuned to minimize the 
ranking score, and so are the two parameters of the HPIC 
algorithm. For the MovieLens data, we have Anp.opt = 
0.16 for the HeatS-|-ProbS algorithm and Aopt = 0.26 and 
77opt = --0.71 for the HPIC algorithm. For the Netflix 
data, we have Anp.opt = 0.23 for the HeatS-l-ProbS algo- 
rithm and Aopt = 0.21 and 77opt = —0.51 for the HPIC 
algorithm. 

Table [1] shows the results. For both data sets, the rank- 
ing score r decreases, and both the precision P and the 
recall R increase from left to right, except that the re- 
caU of HPIC is smaller than that of HeatS-|-ProbS. It 
means that the recommendation accuracy improves from 
Heats to ProbS to HeatS+ProbS to HPIC. Concerning 
the recommendation diversity, the HeatS algorithm gives 
the largest diversity values and the ProbS algorithm re- 
sults in the smallest diversity values. The improvement 
of recommendation diversity after introducing heteroge- 
neous initial configuration in the HeatS-|-ProbS hybrid al- 
gorithm is marginal for the MovieLens data. However, 
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we can observe a significant increase in the two diversity 
measures for the Netflix data. Therefore, we can conclude 
that introducing heterogeneous initial configuration in the 
HeatS-|-ProbS hybrid algorithm can remarkably improve 
the recommendation accuracy and increase more or less 
the recommendation diversity. 



Table 1: Performance comparison of different recommenda- 
tion algorithms according to each of the five metrics: ranking 
score r, precision P, recall R, intra-user diversity Dintra, and 
inter-user diversity Dintor. For the MovieLens data, Anp.opt = 
0.16 for the HeatS-|-ProbS algorithm and Aopt = 0.26 and 
?7opt = —0.71 for our algorithm (HPIC). For the Netflix data, 
Anp.opt = 0.23 for the HeatS-|-ProbS algorithm and Aopt = 0.21 
and r^opt ~ —0.51 for our algorithm. 





Heats 


ProbS 


HeatS-l-ProbS 


HPIC 


MovieLens 


r 


0.149 


0.106 


0.084 


0.079 


P 


0.023 


0.074 


0.084 


0.089 


R 


0.130 


0.476 


0.501 


0.544 


-Dintra 


0.932 


0.638 


0.699 


0.694 


-L^'intcr 


0.862 


0.618 


0.853 


0.867 


Netfiix 


r 


0.107 


0.050 


0.045 


0.039 


P 


0.014 


0.050 


0.056 


0.059 


R 


0.022 


0.385 


0.429 


0.426 


DintTH 


0.995 


0.598 


0.641 


0.721 


-^intcr 


0.788 


0.462 


0.624 


0.780 



Dependence of algorithm accuracy on the object 
degree. — The above investigation focuses on the macro- 
scopic performance of the recommendation algorithms. It 
will be helpful to understand the recommendation algo- 
rithm at the microscopic level by studying the dependence 
of algorithm accuracy on the object degree p3 l [24 ] . In do- 
ing so, the entries in each 10% probe set are sorted ac- 
cording to a descending order of object degrees. Four new 
probe sets, each containing 1000 links, are extracted from 
the 10% probe set: the most popular objects with the 
highest degrees, popular objects with high degrees start- 
ing from one fourth of the sorted link sequence, unpopular 
objects with low degrees starting from the middle of the 
sorted sequence, and the least popular objects with the 
lowest degrees, respectively. The average ranking scores 
corresponding to these four probe sets are calculated for 
the HPIC algorithm with different values of the two pa- 
rameters. The results for popular and unpopular objects 
are illustrated in fig. [3] with contours. 

For popular objects, according to fig. [Sfa) and (e), the 
ranking score is negatively correlated with A for fixed t] and 
the minimal ranking score is reached at A w 1 and rj « 0.5. 
The recommendation performance remains good when A 
is large and 77 is positive, which is particularly evident for 
the Netflix data, see fig. ^e). This observation is con- 
sistent with the fact that popular objects are more likely 



to be recommended when the ProbS algorithm dominates 
and/or popular objects are configured with more initial 
resources (77 > 0). For unpopular objects, according to 
fig- ISId) and (h), the ranking score is positively corre- 
lated with A for fixed r] and the minimal ranking score is 
reached at A « and 77 w —1. This finding is consistent 
with the fact that unpopular objects become more likely 
to be recommended when the HeatS algorithm dominates 
and/or unpopular objects are configured with more initial 
resources (77 < 0). 

Comparing the plots from fig.E^a) to fig.EJd) for Movie- 
Lens or from fig. ^e) to fig. EJh) for Netflix, the overall 
ranking score decreases with the object degree, which is 
expected since popular objects are more frequently col- 
lected by users that makes them popular. For each data 
set, the optimal point (Aopt,'7opt) corresponding to the 
ininimum ranking score in the investigated region (A, 77) = 
[0, 1] X [—5, 5] moves from northeast to southwest when 
the object degree increases. The optimal parameter val- 
ues Aopt and 77opt are listed in table [21 



Table 2: Optimal parameter values Aopt and r^opt in the inves- 
tigated region (A, rj) = [0, 1] x [—5, 5] for four probe sets with 
different object degrees. 



Object degree 


MovieLens 

Aopt ??opt 


Netflix 


Aopt 'yopt 


Highest 


1.0 


0.4 


1.0 0.5 


High 


0.5 


-0.4 


0.5 -0.3 


Low 


0.2 


-0.6 


0.2 -0.4 


Lowest 


0.0 


-1.0 


0.0 -1.0 



Conclusion. — In this work, we have proposed to 
use heterogeneous initial resource configuration in the 
HeatS-|-ProbS hybrid recommendation algorithm. An ad- 
ditional parameter 77 is introduced in this algorithm. We 
investigated the recommendation performance using three 
accuracy measures and two diversity measures testes on 
two benchmark data sets, MovieLens and Netflix. Numeri- 
cal experiments indicate that assigning less initial resource 
on popular objects and more initial resource on unpopu- 
lar objects provides systematic improvements in all these 
measures. More interestingly from the practical point of 
view, our algorithm is robust since the parameter region 
enclosed by the so-called HP line is broad. 

In order to understand the behavior of the proposed 
recommender system on the microscopic level, we inves- 
tigated the recommendation accuracy of objects with dif- 
ferent degrees. We found that the recommendation accu- 
racy is sensitive to both parameters. Popular objects with 
high degrees have higher recommendation accuracy when 
the ProbS part dominates (A = 1) and popular objects 
are assigned with more initial resource (77 > 0), while un- 
popular objects are more accurately recommended when 
the Heats part dominates (A = 0) and popular objects are 
assigned with less initial resource (77 < 0). 
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Fig. 3: (Color online) Contour plots of the ranking score r(A, 77) for four circumstances by averaging the ranking scores of 1000 
entries with different object degrees for MovieLens (a-d) and Netflix (e-h). The object degree decreases from left to right. 



In summary, introducing heterogeneity in the initial 
configuration of resource on objects can improve the rec- 
ommendation performance of the HeatS+ProbS hybrid al- 
gorithm, which is the best network-based recommendation 
algorithm to date in which both accuracy and diversity are 
taken into consideration. The complexity of recommender 
systems uncovered in this work highlights the possibility 
of further improvements in algorithm design. 
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